AITopics | lipschitz-continuous epsilon-optimal value function

Collaborating Authors

lipschitz-continuous epsilon-optimal value function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Neural Information Processing SystemsNov-20-2025, 23:07:00 GMT

Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem--a belief MDP--and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex). This approach has been extended to solving ρ-POMDPs--i.e., for information-oriented criteria--when the reward ρ is convex in . General ρ-POMDPs can also be turned into "fully observable" problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ -Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper-and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems.

electronic proceedings, lipschitz-continuous epsilon-optimal value function, name change, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Reviews: rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Neural Information Processing SystemsOct-8-2024, 06:43:02 GMT

The paper addresses the problem of rho-POMDPs non-convex reward functions, proving that indeed under some cases they, and their resulting value functions, are Lipschitz-continuous (LC) for finite horizons. The paper also proposes and uses a more general vector form of LC, too. This result allows value function approximations of the optimal V * to be used, as well as upper and lower bounds (U and L) on value as in HSVI, and a wide array of new algorithms to be developed. This is analogous to the PWLC result for standard POMDPs, as LC is more general, allowing for similar contraction operators with Banach's fixed point theorem as in (PO)MDPs, and finite horizon approximations of the infinite horizon objective criteria. Once the paper establishes the main result, it discusses approximations of U and L using min or max, respectively, over sets of cones.

approximation, lipschitz-continuous epsilon-optimal value function, standard pomdp, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Fehr, Mathieu, Buffet, Olivier, Thomas, Vincent, Dibangoye, Jilles

Neural Information Processing SystemsFeb-14-2020, 19:11:34 GMT

Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem--a belief MDP--and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex). This approach has been extended to solving ρ-POMDPs--i.e., for information-oriented criteria--when the reward ρ is convex in . General ρ-POMDPs can also be turned into "fully observable" problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ -Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper- and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds.

artificial intelligence, lipschitz-continuous epsilon-optimal value function, machine learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback